Manifold-aware ranking kernel for information retrieval

ABSTRACT

A manifold-aware ranking kernel (MARK) for information retrieval is described herein. The MARK is implemented by using supervised and unsupervised learning. MARK is ranking-oriented such that the relative comparison formulation directly targets on the ranking problem, making the approach optimal for information retrieval. MARK is also manifold-aware such that the algorithm is able to exploit information from ample unlabeled data, which helps to improve generalization performance, particularly when there are limited number of labeled constraints. MARK is nonlinear: as a kernel-based approach, the algorithm is able to lead to a highly non-linear metric which is able to model complicated data distribution.

CROSS-REFERENCE TO RELATED APPLICATION

This application claims the benefit of U.S. Provisional PatentApplication Ser. No. 61/794,003, filed on Mar. 15, 2013, and titled“MANIFOLD-AWARE RANKING KERNEL FOR INFORMATION RETRIEVAL” which ishereby incorporated by reference in its entirety for all purposes.

FIELD OF THE INVENTION

The present invention relates to the field of data searching. Morespecifically, the present invention relates to data searching usinginformation learned from training data.

BACKGROUND OF THE INVENTION

In any information retrieval system, one key component is how to measurethe similarity of a query object and objects to retrieve. Althoughpre-defined metrics are widely used, metrics learned from data havereceived more and more attention because such metrics are able to adaptto specific properties of the data of interest, resulting in higherretrieval accuracy.

SUMMARY OF THE INVENTION

A manifold-aware ranking kernel (MARK) for information retrieval isdescribed herein. The MARK is implemented by using supervised andunsupervised learning. MARK is ranking-oriented such that the relativecomparison formulation directly targets on the ranking problem, makingthe approach optimal for information retrieval. MARK is alsomanifold-aware such that the algorithm is able to exploit informationfrom ample unlabeled data, which helps to improve generalizationperformance, particularly when there are limited number of labeledconstraints. MARK is nonlinear: as a kernel-based approach, thealgorithm is able to lead to a highly non-linear metric which is able tomodel complicated data distribution.

In one aspect, a method of manifold-aware ranking kernel learningprogrammed in a memory of a device comprises performing combinedsupervised kernel learning and unsupervised manifold kernel learning andgenerating a non-linear kernel model. Bregman projection is utilizedwhen performing the supervised kernel learning. Unlabeled data isutilized in the unsupervised manifold kernel learning. The resultcomprises a non-linear metric defined by a kernel model. The supervisedkernel learning employs a relative comparison constraint. The device isselected from the group consisting of a personal computer, a laptopcomputer, a computer workstation, a server, a mainframe computer, ahandheld computer, a personal digital assistant, a cellular/mobiletelephone, a smart phone, a smart appliance, a gaming console, a digitalcamera, a digital camcorder, a camera phone, an portable music player, atablet computer, a video player, a DVD writer/player, a high definitionvideo writer/player, a television and a home entertainment system.

In another aspect, a method of information retrieval programmed in amemory of a device comprises receiving a search query input, performinga search based on the search query input and using a metric kernellearned by manifold-aware ranking kernel learning and presenting asearch result of the search. Manifold-aware ranking kernel learningcomprises performing combined supervised kernel learning andunsupervised manifold kernel learning and generating a non-linear kernelmodel. Bregman projection is utilized when performing the supervisedkernel learning. Unlabeled data is utilized in the unsupervised manifoldkernel learning. The result comprises a non-linear metric defined by akernel model. The supervised kernel learning employs a relativecomparison constraint. The search result comprises a set of entitiesfrom a database that are similar to the search query input. The deviceis selected from the group consisting of a personal computer, a laptopcomputer, a computer workstation, a server, a mainframe computer, ahandheld computer, a personal digital assistant, a cellular/mobiletelephone, a smart phone, a smart appliance, a gaming console, a digitalcamera, a digital camcorder, a camera phone, an portable music player, atablet computer, a video player, a DVD writer/player, a high definitionvideo writer/player, a television and a home entertainment system.

In another aspect, an apparatus comprises a non-transitory memory forstoring an application, the application for: performing combinedsupervised kernel learning and unsupervised manifold kernel learning andgenerating a non-linear kernel model and a processing component coupledto the memory, the processing component configured for processing theapplication. Bregman projection is utilized when performing thesupervised kernel learning. Unlabeled data is utilized in theunsupervised manifold kernel learning. The result comprises a non-linearmetric defined by a kernel model. The supervised kernel learning employsa relative comparison constraint.

In another aspect, an apparatus comprises a non-transitory memory forstoring an application, the application for: receiving a search queryinput, performing a search based on the search query input and using ametric kernel learned by manifold-aware ranking kernel learning andpresenting a search result of the search and a processing componentcoupled to the memory, the processing component configured forprocessing the application. Manifold-aware ranking kernel learningcomprises: performing combined supervised kernel learning andunsupervised manifold kernel learning and generating a non-linear kernelmodel. Bregman projection is utilized when performing the supervisedkernel learning. Unlabeled data is utilized in the unsupervised manifoldkernel learning. The result comprises a non-linear metric defined by akernel model. The supervised kernel learning employs a relativecomparison constraint. The search result comprises a set of entitiesfrom a database that are similar to the search query input.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a flowchart of a method of manifold-aware rankingkernel learning according to some embodiments.

FIG. 2 illustrates a flowchart of a method of information retrievalaccording to some embodiments.

FIG. 3 illustrates a block diagram of an exemplary computing deviceconfigured to implement the information retrieval method according tosome embodiments.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT

A novel metric learning algorithm, e.g., manifold-aware ranking kernel(MARK), is described herein. MARK learns a positive semi-definite kernelmatrix by simultaneously maximizing the relative comparison margins andexploring the intrinsic data manifold structure. An efficientoptimization method, Bregman projection, is employed in learning MARK.

The relative comparison formulation of the algorithm directly targets ona ranking problem, making the approach optimal for informationretrieval. The algorithm is able to exploit information from ampleunlabeled data, which helps to improve generalization performance,especially when there are a limited number of labeled constraints. As akernel-based approach, the algorithm is able to lead to a highlynon-linear metric which is able to model complicated data distribution.

Although developed with content-based retrieval of images as the mainapplication, the described algorithm is a general metric learningapproach that is able to be employed in other information retrievalapplications as well.

Data Formulation

Before going to the details of the learning framework, the descriptionof the data formulation is given. It is assumed there are N data points{x₁, x₂, . . . x_(N)}, where x_(i)εR^(D) (1≦i≦N) is a D-dimensionalvector representing a single data sample. Among these N data points, thefirst N₁ points {x₁, x₂, . . . x_(N) ₁ } are the training set, and therest {x_(N) ₁ , x₂, . . . x_(N)} is the testing set.

Different from the standard supervised classification learning, theclass/category label of the training data is not available in thetraining stage. Instead of class labels, C rank-list constraints{C_(j)}_(j=1) ^(C) are used, where C_(j)={x_(j) ₀ ;Z_(j)} includes botha query y_(j) and a list of relevant feedback Z_(j). Here y_(j) andZ_(j) both belong to the training data set and Z_(j)={x_(j) ₁ , x_(j) ₂, . . . , x_(j)}. The list of feedback encodes the high order relativecomparison information, e.g., for 1≦r<s≦L_(j) (L_(j)≧2), the distanced(x_(j) ₀ ,x_(j)) is smaller than d(x_(j) ₀ ,x_(j)) (similarity functionis able to be used here to replace the distance function).

For two different constraints, C_(j) and C_(k), their lengths are ableto be different, e.g., there are variable lengthed rank-listconstraints. In an extreme case, if all L_(j) are equal to 2 (theminimum possible value), then the rank-list constraints degenerate tothe triplet/relative distance constraint, which has been shown to bemore flexible than pair-wise (dis)-similarity constraint in previousmetric learning works. Thus, it is able to be seen that the rank-listconstraint is the most general constraint encoding the distanceinformation in the metric/kernel learning domain.

Most of the learning to rank works focus on learning a proper absolutescore function for the query while the focus herein is on the relativecomparison.

Loss Function

The linear Mahalanobis distance metric learning problem is firstconsidered, and the following loss function is used:

$\begin{matrix}{{{\text{?}{f\left( {A,A_{U}} \right)}} + {g\left( {A_{U},\left\{ x_{i} \right\}} \right)}}\mspace{85mu} {{{{s.t.\text{:}}\mspace{14mu} {r\left( {A,C_{j}} \right)}} \leq 0},{1 \leq j \leq C}}{\text{?}\text{indicates text missing or illegible when filed}}} & (1)\end{matrix}$

where AεR^(D×D), A_(U)εR^(D×D) are the final learned Mahalanobisdistance matrix and the unsupervised distance matrix which must bepositive semi-definite (P.S.D.). ƒ is the regularizer which ties theunsupervised learning and rank-constraints based learning together. g isthe data function for the unsupervised distance matrix learning from thedata itself without any other information. r is the rank-list constraintfunction that ties A and available (rank-list) side informationtogether. Given AεR^(D×D), the distance d^(A)(x_(i), x_(j)) betweenx_(i), and x_(j) can be calculated asTr(A(x_(i)−x_(j))(x_(i)−x_(j))^(T)).

In the above formulation, A and A_(U) are jointly optimized, whichenjoys certain advantages but often leads to a difficult optimizationproblem. In particular, the regularizer and data cost function are quitedifferent from each other in the usual case, and thus, the following twostep approach is used:

$\begin{matrix}{\mspace{79mu} {{{{step}\; 1\text{:}\mspace{14mu} K_{u}^{*}} = {\text{?}\mspace{14mu} {g\left( {K_{U},\left\{ x_{i} \right\}} \right)}}}\mspace{79mu} {{{step}\; 2\text{:}\mspace{14mu} {\min\limits_{K \geq 0}\mspace{14mu} {f\left( {K,K_{U}^{*}} \right)}}},{{s.t.}:\mspace{14mu} {{r\left( {K,C_{j}} \right)} \leq 0}},{1 \leq j \leq C}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (3)\end{matrix}$

These two steps for (2) are able to be specifically designed (use domainknowledge) which also has an easier optimization problem than (1).

Despite the successes of the linear distance metric learning, it hasbeen found that real data usually has a complicated and nonlinearstructure which possibly is not able to be fully handled by the lineardistance metric. Thus, the following nonlinear kernel framework is used,

$\begin{matrix}{\mspace{79mu} {{{{{step}\; 1}:\mspace{14mu} K_{U}^{*}} = {\text{?}\mspace{14mu} {g\left( {K_{U},\left\{ x_{i} \right\}} \right)}}}\mspace{79mu} {{{{step}\; 2}:\mspace{14mu} {\min\limits_{K \geq 0}\mspace{14mu} {f\left( {K,K_{U}^{*}} \right)}}},{{{s.t.\text{:}}\mspace{14mu} {r\left( {K,C_{j}} \right)}} \leq 0},{1 \leq j \leq C}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (3)\end{matrix}$

where KεR^(N×N), K_(U)εR^(N×N) are the final learned kernel matrix andthe unsupervised kernel matrix which is positive semi-definite (P.S.D.).Functions ƒ, g and r have the same meaning as Equation (1) but areadapted into the kernel domain.

Since K_(U) and K are learned separately in Equation (3), discussion isincluded of how to choose the unsupervised kernel learning function g,and how to choose regularizer ƒ and rank-list constraint function r, astwo paralleled directions. In particular, handling the rank-listinformation C_(j) in r is important, considering that the learned kernelis applied to ranking and retrieval. For instance, one of the choices isto use the K-L divergence of permutation probability in r. Formally,given K, the distance d^(K) is able to be calculated asd^(K)(x_(i),x_(j))=Tr(K(e_(i)−e_(j))(e_(i)−e_(j))^(T)). Thus, thedistance between all feedback and the query should be {Tr(K(e_(j) ₀−e_(j))(e_(j) ₀ −e_(j))^(T))}_(i=1) ^(L) (for constraint C_(j)). Then,function r(K,C_(j)) is able to be represented as r({d_(j) ₀ _(j) ₁^(K)}_(i=1) ^(L)), where the first term in the function is the shortnotation of all distances between L_(j) feedback and the query, and thesecond term is the relative distance order information from C_(j)(j_(k)

j₁ for k<l).

It has been shown that, for the exponential score function, the scorevalue (distance is used as the input) is only depending on the relativedistance information.

Based on the general formulation in Equation (3), a metric learningalgorithm, Manifold-Aware Ranking Kernel (MARK), is able to be used.MARK employs the Bregman divergence (LogDet) regularizer. The supervisedlearning and the unsupervised manifold kernel learning are describedherein, and the amplified commute kernel is chosen to be theunsupervised kernel.

Supervised Kernel Learning with Relative Comparison Constraint

It is able to be shown that the rank-list constraint is able to bedecomposed into multiple (standard) relative comparison constraints, forcertain rank-list constraint functions. Thus, instead of using the Crank-list constraints {C_(j)}_(j=1) ^(C), it is assumed there are Crelative constraints, {[x_(j) ₀ ;x_(j) ₁ ,x_(j)]}_(j=1) ^(C), wherex_(j) ₀ is closer to x_(j) ₀ than x_(j) ₀ (or short noted as j₁

j₂). By choosing the initial relative comparison margin and the LogDetdivergence in the regularizer, the supervised learning part of frameworkEquation (3) is able to be presented as

$\begin{matrix}{\mspace{79mu} {{\text{?}\mspace{14mu} {D_{\varphi}\left( {K,K_{U}} \right)}}\mspace{79mu} {{{{s.t.\mspace{14mu} {d^{K}\left( {\text{?},\text{?}} \right)}} - {d^{K}\left( {\text{?},\text{?}} \right)}} \geq \xi},{1 \leq j \leq C}}{\text{?}\text{indicates text missing or illegible when filed}}}} & (4)\end{matrix}$

where d^(K) is the kernel distance function (given before) and D_(φ) isthe Bregman matrix divergence D_(φ)(K₁,K₂)=φ(K₁)−φ(K₂)−tr(∇φ)(K₂)^(T)(K₁−K₂)), which is specifically chosen asthe LogDet function, e.g., φ(K)=−log det(K) and D_(φ)(K₁,K₂)=tr(K₁K₂)−log det(K₁ K₂ ⁻¹)−N.

The constraints in Equation (4) are able to be reformulated as linearconstraints enforced on the kernel by using the definition of the kerneldistance function. Thus, a convex optimization problem is obtained whichhas a convex objective function and linear constraints as,

$\begin{matrix}{{\text{?}{D_{\varphi}\left( {K,K_{U}} \right)}}{{{s.t.\mspace{14mu} {{Tr}\left( {{{K\left( {\text{?},\text{?}} \right)}\left( {\text{?},\text{?}} \right)^{T}} - {{K\left( {\text{?}\text{?}} \right)}\left( {\text{?},\text{?}} \right)^{T}}} \right)}} \geq \xi},{1 \leq j \leq C}}{\text{?}\text{indicates text missing or illegible when filed}}} & (5)\end{matrix}$

where e_(j) is a constant vector in which all elements are 0 except theJth element is 1. K_(U) is the unsupervised manifold kernel. ξ is therelative comparison (safe) margin which is to be satisfied.

Equation (5) appears as a reasonable optimization framework, however, itstill has potential problems for real applications. Firstly, there maynot exist a feasible P.S.D kernel K as the solution of Equation (5).Secondly, in certain cases, online users provide the relative comparisonconstraint which may be incorrect. Lastly, for some examples, x_(j) ₀ ,the discrminability between x_(j) ₀ and others is not inherent. Thus,using the constraint involved x_(j) ₀ (e.g., {[x_(j) ₀ ;x_(j) ₁,x_(j)]}_(j=1) ^(C)) to satisfy the same margin as other constraints isnot a reasonable choice.

By considering all three issues, slack variables and local margins areintroduced to get a more practical optimization problem. To this end,

$\begin{matrix}{{{\text{?}{D_{\varphi}\left( {K,K_{U}} \right)}} + {\gamma \text{?}\left( {\overset{\rightarrow}{\xi},{\overset{\rightarrow}{\xi}}_{0}} \right)}}{{{s.t.\mspace{14mu} {{Tr}\left( {{{K\left( {\text{?},\text{?}} \right)}\left( {\text{?},\text{?}} \right)^{T}} - {{K\left( {\text{?}\text{?}} \right)}\left( {\text{?},\text{?}} \right)^{T}}} \right)}} \leq {\overset{\rightarrow}{\xi}(j)}},{1 \leq j \leq {C\text{?}\text{indicates text missing or illegible when filed}}}}} & (6)\end{matrix}$

where {right arrow over (ξ)}₀ is the original given length-C marginvectors and {right arrow over (ξ)} is the one be joint-optimized inEquation (6). The margin is changed to be negative in framework Equation(6).

Given an unsupervised learnt kernel K_(U), the supervised kernel K isable to be optimized from Equation (6) by using the Bregman projection.For each iteration, the Bregman optimization method will pick up aconstraint and do the projection by solving the following equations,

$\begin{matrix}{\mspace{85mu} {\quad\left\{ {\begin{matrix}{{\nabla_{\varphi}\left( K_{t + 1} \right)} = {{\nabla_{\varphi}\left( K_{t} \right)} + {\alpha_{j}A_{j}}}} \\{{\nabla_{\varphi}\left( \xi_{t + 1} \right)} = {{\nabla_{\varphi}\left( {\overset{\rightarrow}{\xi}}_{t} \right)} - {\frac{\alpha_{j}}{\gamma}e_{i}}}} \\{{{Tr}\left( {K_{t + 1}A_{j}} \right)} = {e_{j}^{T}\text{?}}}\end{matrix}\text{?}\text{indicates text missing or illegible when filed}} \right.}} & (7)\end{matrix}$

where A_(j)εR^(N×N) is the short notation of constraint matrix (e_(j) ₀−e_(j) ₁ )(e_(j) ₀ −e_(j) ₁ )^(T)−(e_(j) ₀ −e_(j) ₂ )(e_(j) ₀ −e_(j) ₂)^(T), which is essentially a rank-2 matrix, and K_(t)εR^(N×N) is thesolution at the time-t iteration.The Bregman matrix divergence and vector divergence are used for φ andφ, thus, Equation (7) is able to be formulated as,

$\begin{matrix}{\quad\mspace{79mu} \left\{ {\begin{matrix}{K_{t + 1} = \left( {K_{t}^{- 1} - {\alpha_{j}A_{j}}} \right)^{- 1}} \\{{e_{j}^{T}\xi_{t + 1}^{\rightarrow}} = \frac{\gamma \; e_{j}^{T}{\overset{\rightarrow}{\xi}}_{t}}{\gamma + {\alpha_{j}e_{j}^{T}{\overset{\rightarrow}{\xi}}_{t}}}} \\{{{Tr}\left( {K_{t + 1}A_{j}} \right)} = {e_{j}^{T}\text{?}}}\end{matrix}\text{?}\text{indicates text missing or illegible when filed}} \right.} & (8)\end{matrix}$

where {right arrow over (ξ)}, is the length-C margin vector in thetime-t iteration and α_(j) is the updating parameter to be solved.

The core-part of the Equation (8) is the matrix inverse processK_(t+1)=(K_(t) ⁻¹−α_(j)A_(j))⁻¹, which is essentially a rank-2 updating(compare with the rank-1 updating in ITML and low-rank kernel learning).Based on the matrix inverse Sherman-Morrison formula, there is,

$\begin{matrix}{K_{t + 1} = {K_{t} = {{\frac{\alpha_{j}\left( {1 + {\alpha_{j}q_{j}}} \right)}{{\left( {1 - {\alpha_{j}p_{j}}} \right)\left( {1 + {\alpha_{j}q_{j}}} \right)} + {\alpha_{i}^{2}c_{i}^{2}}}K_{t}z_{j}z_{j}^{T}K_{t}} - {\frac{\alpha_{j}\left( {1 + {\alpha_{j}p_{j}}} \right)}{{\left( {1 - {\alpha_{j}p_{j}}} \right)\left( {1 + {\alpha_{j}q_{j}}} \right)} + {\alpha_{i}^{2}c_{i}^{2}}}K_{t}w_{j}w_{j}^{T}K_{t}} - {\frac{\alpha_{j}^{2}c_{i}}{{\left( {1 - {\alpha_{j}p_{j}}} \right)\left( {1 + {\alpha_{j}q_{j}}} \right)} + {\alpha_{i}^{2}c_{i}^{2}}}{K_{t}\left( {z_{j}w_{j}^{T}} \right)}K_{t}} - {\frac{\alpha_{j}^{2}c_{i}}{{\left( {1 - {\alpha_{j}p_{j}}} \right)\left( {1 + {\alpha_{j}q_{j}}} \right)} + {\alpha_{i}^{2}c_{i}^{2}}}{K_{t}\left( {w_{j}z_{j}^{T}} \right)}K_{t}}}}} & (9)\end{matrix}$

where z_(j) and w_(j) are the short-notations for (e_(j) ₀ −e_(j) ₁ )and (e_(j) ₀ −e_(j) ₂ ). Also, p_(j), q_(j) and c_(j) are theshort-notations for z_(i) ^(T)K_(t)z_(i),w_(i) ^(T)K_(t)w_(i) and w_(i)^(T)K_(t)z_(i).

By combining Equations (9) and (8), the Bregman updating parameter α_(j)is able to be solved by the following quadratic equation,

$\begin{matrix}{{{\left\{ {\left( {{p_{j}q_{j}} - c_{j}^{2}} \right)\left( {{2\; e_{j}^{T}{\overset{\rightarrow}{\xi}}_{t}} + {\gamma \; e_{j}^{T}{\overset{\rightarrow}{\xi}}_{t}}} \right)} \right\} \alpha_{j}^{2}} + {\left\{ {{\left( {{2\; p_{j}q_{j}} - {2\; c_{j}^{2}}} \right)\gamma} + {\left( {p_{j} - q_{j}} \right)e_{j}^{T}{\overset{\rightarrow}{\xi}}_{t}} + {\left( {p_{j} - q_{j}} \right)\gamma \; e_{j}^{T}{\overset{\rightarrow}{\xi}}_{t}}} \right\} \alpha_{j}} + \left\{ {{\left( {p_{j} - q_{j}} \right)\gamma} - {\gamma \; e_{j}^{T}{\overset{\rightarrow}{\xi}}_{t}}} \right\}} = 0} & (10)\end{matrix}$

Equation (10) is a standard quadratic equation for α_(j).

The in depth analysis of this Bregman projection is given herein, and itis assumed that Equation (10) always has two solutions, and the smallerone is used in the updating process. The complete updating process inone iteration is given as follows,

$\begin{matrix}{\quad\mspace{79mu} \left\{ {\left. \left\lbrack {{\alpha_{j}(1)}{\alpha_{j}(2)}} \right\rbrack\leftarrow{\alpha \left( {p_{j},q_{j},c_{j},\gamma,{e_{j}^{T}{\overset{\rightarrow}{\xi}}_{t}}} \right)} \right.,\mspace{79mu} {\alpha_{j} = {\min\left( \left\lbrack {{\alpha_{j}(1)}{\alpha_{j}(2)}\mspace{79mu} \left\{ {\beta_{x},\beta_{w},\beta_{zw},\beta_{wz}} \right\}}\leftarrow{{\beta \left( {\alpha_{j},p_{j},q_{j},c_{j},\gamma} \right)}K_{t + 1}}\leftarrow{K_{t} + {\beta_{z}K_{t}z_{j}z_{j}^{T}K_{t}} + {\beta_{w}K_{t}z_{j}z_{j}^{T}K_{t}} + {\beta_{zw}K_{t}z_{j}z_{j}^{T}K_{t}} + {\beta_{wz}K_{t}z_{j}z_{j}^{T}K_{t}\mspace{79mu} \lambda_{j}}}\leftarrow{\lambda_{j} - {\alpha_{j}\mspace{79mu} e_{j}^{T}\xi_{t + 1}}}\leftarrow\frac{\gamma \; e_{j}^{T}{\overset{\rightarrow}{\xi}}_{t}}{\gamma + {\alpha_{j}e_{j}^{T}{\overset{\rightarrow}{\xi}}_{t}}} \right. \right.}}} \right.} & (11)\end{matrix}$

The newly introduced variable λ, is the dual variable corresponding toconstraint. λ_(i) is required to be non-negative to satisfy the K.K.Tcondition for the optimization process. The same as the margin vector{right arrow over (ξ)}_(t), a length-C dual variable vector {right arrowover (λ)}_(j) at the time-t iteration is able to be used.The updating process will be stopped if {right arrow over (λ)}_(j) isconverged.

Unsupervised Manifold Kernel Learning

Choosing a proper unsupervised manifold kernel K_(U) is anotherimportant feature of the MARK algorithm. The amplified commute timekernel (ACK) is used in Equation (6).

The ACK kernel is induced from the amplified commute time distance,

$\begin{matrix}{{{K_{ACK}\left( {i,j} \right)} = \frac{\overset{\_}{K_{ACK}\left( {i,j} \right)}}{\left( {{K_{ACK}\left( {i,i} \right)}{K_{ACK}\left( {j,j} \right)}} \right)^{2}}}{\overset{\_}{K_{ACK}} = {{- \left( {I - {\frac{1}{N}1_{N}1_{N}^{T}}} \right)}{C_{ACD}\left( {I - {\frac{1}{N}1_{N}1_{N}^{T}}} \right)}}}} & (12)\end{matrix}$

where C_(ACD) is the amplified commute time distance matrix and is ableto be calculated as,

$\quad\begin{matrix}{{{C_{ACD}\left( {i,j} \right)} = {S_{ij} + u_{ij}}}{{C_{ACD}\left( {i,j} \right)} = 0}\left\{ \begin{matrix}{S_{ij} = {R_{ij} - \frac{1}{d_{i}} - \frac{1}{d_{j}}}} \\{u_{ij} = {\frac{2\; w_{ij}}{d_{i}d_{j}} - \frac{w_{ii}}{d_{i}^{2}} - \frac{w_{jj}}{d_{j}^{2}}}}\end{matrix} \right.} & (13)\end{matrix}$

where R_(ij) is the resistance distance of the random walk between nodei and node j on the data graph.Resistance distance is tightly connected with the commute time distance,

C _(ij)=vol(G)R _(ij)  (14)

where C_(ij) is the commute time distance between node i and node j, andG is the indirect weighted graph, which is built from the data. Inparticular, the commute time distance is able to be calculated from thepseudo-inverse of the Laplacian matrix of graph G,

C≡L ^(†)=(D−W)^(†)  (15)

where L is the unnormalized graph Laplacian matrix for graph G, and W isthe weight matrix of G.

FIG. 1 illustrates a flowchart of a method of manifold-aware rankingkernel learning according to some embodiments. In the step 100,supervised kernel learning is performed. In some embodiments, theBregman optimization method is utilized in the supervised kernellearning. In the step 102, unsupervised manifold kernel learning isperformed. In the step 104, a metric kernel is generated from thesupervised and unsupervised manifold kernel learning. In someembodiments, fewer or additional steps are implemented. In someembodiments, the order of the steps is modified.

FIG. 2 illustrates a flowchart of a method of information retrievalaccording to some embodiments. In the step 200, a search query input isreceived. For example, a user provides an image or text to search for.In the step 202, a search is performed using the metric kernel learnedby manifold-aware ranking kernel learning. In the step 204, results arepresented. The results are able to be in any presentation format (e.g.,a list of hyperlinks, a table of images, a single image). In someembodiments, fewer or additional steps are implemented. In someembodiments, the order of the steps is modified.

FIG. 3 illustrates a block diagram of an exemplary computing deviceconfigured to implement the manifold-aware ranking kernel forinformation retrieval method according to some embodiments. Thecomputing device 300 is able to be used to acquire, store, compute,process, communicate and/or display information such as text, images andvideos. In general, a hardware structure suitable for implementing thecomputing device 300 includes a network interface 302, a memory 304, aprocessor 306, I/O device(s) 308, a bus 310 and a storage device 312.The choice of processor is not critical as long as a suitable processorwith sufficient speed is chosen. The memory 304 is able to be anyconventional computer memory known in the art. The storage device 312 isable to include a hard drive, CDROM, CDRW, DVD, DVDRW, Blu-ray®, flashmemory card or any other storage device. The computing device 300 isable to include one or more network interfaces 302. An example of anetwork interface includes a network card connected to an Ethernet orother type of LAN. The I/O device(s) 308 are able to include one or moreof the following: keyboard, mouse, monitor, screen, printer, modem,touchscreen, button interface and other devices. Information retrievalapplication(s) 330 used to perform the information retrieval method arelikely to be stored in the storage device 312 and memory 304 andprocessed as applications are typically processed. More or lesscomponents shown in FIG. 3 are able to be included in the computingdevice 300. In some embodiments, information retrieval hardware 320 isincluded. Although the computing device 300 in FIG. 3 includesapplications 330 and hardware 320 for the information retrieval method,the information retrieval method is able to be implemented on acomputing device in hardware, firmware, software or any combinationthereof. For example, in some embodiments, the information retrievalapplications 330 are programmed in a memory and executed using aprocessor. In another example, in some embodiments, the informationretrieval hardware 320 is programmed hardware logic including gatesspecifically designed to implement the information retrieval method.

In some embodiments, the information retrieval application(s) 330include several applications and/or modules. In some embodiments,modules include one or more sub-modules as well. In some embodiments,fewer or additional modules are able to be included.

Examples of suitable computing devices include a personal computer, alaptop computer, a computer workstation, a server, a mainframe computer,a handheld computer, a personal digital assistant, a cellular/mobiletelephone, a smart appliance, a gaming console, a digital camera, adigital camcorder, a camera phone, a smart phone, a portable musicplayer, a tablet computer, a mobile device, a video player, a video discwriter/player (e.g., DVD writer/player, Blu-ray® writer/player), atelevision, a home entertainment system or any other suitable computingdevice.

To utilize the information retrieval method, a device or several devicesare used to perform MARK on a data set. MARK is able to be performedautomatically. Using the results of MARK, a user is able to search forand retrieve information. In operation, the MARK algorithm describedherein has the following benefits:

-   -   Ranking-oriented: The relative comparison formulation directly        targets on ranking problem, making the approach optimal for        information retrieval.    -   Manifold-aware: The algorithm is able to exploit information        from ample unlabeled data, which helps to improve generalization        performance, especially when there are limited number of labeled        constraints.    -   Nonlinear: As a kernel-based approach the algorithm is able to        lead to a highly non-linear metric which is able to model        complicated data distribution.

Some Embodiments of Manifold-Aware Ranking Kernel for InformationRetrieval

-   1. A method of manifold-aware ranking kernel learning programmed in    a memory of a device comprising:    -   a. performing combined supervised kernel learning and        unsupervised manifold kernel learning; and    -   b. generating a non-linear kernel model.-   2. The method of clause 1 wherein Bregman projection is utilized    when performing the supervised kernel learning.-   3. The method of clause 1 wherein unlabeled data is utilized in the    unsupervised manifold kernel learning.-   4. The method of clause 1 wherein the result comprises a non-linear    metric defined by a kernel model.-   5. The method of clause 1 wherein the supervised kernel learning    employs a relative comparison constraint.-   6. The method of clause 1 wherein the device is selected from the    group consisting of a personal computer, a laptop computer, a    computer workstation, a server, a mainframe computer, a handheld    computer, a personal digital assistant, a cellular/mobile telephone,    a smart phone, a smart appliance, a gaming console, a digital    camera, a digital camcorder, a camera phone, an portable music    player, a tablet computer, a video player, a DVD writer/player, a    high definition video writer/player, a television and a home    entertainment system.-   7. A method of information retrieval programmed in a memory of a    device comprising:    -   a. receiving a search query input;    -   b. performing a search based on the search query input and using        a metric kernel learned by manifold-aware ranking kernel        learning; and    -   c. presenting a search result of the search.-   8. The method of clause 7 wherein manifold-aware ranking kernel    learning comprises:    -   i. performing combined supervised kernel learning and        unsupervised manifold kernel learning; and    -   ii. generating a non-linear kernel model.-   9. The method of clause 8 wherein Bregman projection is utilized    when performing the supervised kernel learning.-   10. The method of clause 8 wherein unlabeled data is utilized in the    unsupervised manifold kernel learning.-   11. The method of clause 8 wherein the result comprises a non-linear    metric defined by a kernel model.-   12. The method of clause 8 wherein the supervised kernel learning    employs a relative comparison constraint.-   13. The method of clause 7 wherein the search result comprises a set    of entities from a database that are similar to the search query    input.-   14. The method of clause 7 wherein the device is selected from the    group consisting of a personal computer, a laptop computer, a    computer workstation, a server, a mainframe computer, a handheld    computer, a personal digital assistant, a cellular/mobile telephone,    a smart phone, a smart appliance, a gaming console, a digital    camera, a digital camcorder, a camera phone, an portable music    player, a tablet computer, a video player, a DVD writer/player, a    high definition video writer/player, a television and a home    entertainment system.-   15. An apparatus comprising:    -   a. a non-transitory memory for storing an application, the        application for:        -   i. performing combined supervised kernel learning and            unsupervised manifold kernel learning; and        -   ii. generating a non-linear kernel model; and    -   b. a processing component coupled to the memory, the processing        component configured for processing the application.-   16. The apparatus of clause 15 wherein Bregman projection is    utilized when performing the supervised kernel learning.-   17. The apparatus of clause 15 wherein unlabeled data is utilized in    the unsupervised manifold kernel learning.-   18. The apparatus of clause 15 wherein the result comprises a    non-linear metric defined by a kernel model.-   19. The apparatus of clause 15 wherein the supervised kernel    learning employs a relative comparison constraint.-   20. An apparatus comprising:    -   a. a non-transitory memory for storing an application, the        application for:        -   i. receiving a search query input;        -   ii. performing a search based on the search query input and            using a metric kernel learned by manifold-aware ranking            kernel learning; and        -   iii. presenting a search result of the search; and    -   b. a processing component coupled to the memory, the processing        component configured for processing the application.-   21. The apparatus of clause 20 wherein manifold-aware ranking kernel    learning comprises:    -   i. performing combined supervised kernel learning and        unsupervised manifold kernel learning; and    -   ii. generating a non-linear kernel model.-   22. The apparatus of clause 21 wherein Bregman projection is    utilized when performing the supervised kernel learning.-   23. The apparatus of clause 21 wherein unlabeled data is utilized in    the unsupervised manifold kernel learning.-   24. The apparatus of clause 21 wherein the result comprises a    non-linear metric defined by a kernel model.-   25. The apparatus of clause 21 wherein the supervised kernel    learning employs a relative comparison constraint.-   26. The apparatus of clause 20 wherein the search result comprises a    set of entities from a database that are similar to the search query    input.

The present invention has been described in terms of specificembodiments incorporating details to facilitate the understanding ofprinciples of construction and operation of the invention. Suchreference herein to specific embodiments and details thereof is notintended to limit the scope of the claims appended hereto. It will bereadily apparent to one skilled in the art that other variousmodifications may be made in the embodiment chosen for illustrationwithout departing from the spirit and scope of the invention as definedby the claims.

What is claimed is:
 1. A method of manifold-aware ranking kernellearning programmed in a memory of a device comprising: a. performingcombined supervised kernel learning and unsupervised manifold kernellearning; and b. generating a non-linear kernel model.
 2. The method ofclaim 1 wherein Bregman projection is utilized when performing thesupervised kernel learning.
 3. The method of claim 1 wherein unlabeleddata is utilized in the unsupervised manifold kernel learning.
 4. Themethod of claim 1 wherein the result comprises a non-linear metricdefined by a kernel model.
 5. The method of claim 1 wherein thesupervised kernel learning employs a relative comparison constraint. 6.The method of claim 1 wherein the device is selected from the groupconsisting of a personal computer, a laptop computer, a computerworkstation, a server, a mainframe computer, a handheld computer, apersonal digital assistant, a cellular/mobile telephone, a smart phone,a smart appliance, a gaming console, a digital camera, a digitalcamcorder, a camera phone, an portable music player, a tablet computer,a video player, a DVD writer/player, a high definition videowriter/player, a television and a home entertainment system.
 7. A methodof information retrieval programmed in a memory of a device comprising:a. receiving a search query input; b. performing a search based on thesearch query input and using a metric kernel learned by manifold-awareranking kernel learning; and c. presenting a search result of thesearch.
 8. The method of claim 7 wherein manifold-aware ranking kernellearning comprises: i. performing combined supervised kernel learningand unsupervised manifold kernel learning; and ii. generating anon-linear kernel model.
 9. The method of claim 8 wherein Bregmanprojection is utilized when performing the supervised kernel learning.10. The method of claim 8 wherein unlabeled data is utilized in theunsupervised manifold kernel learning.
 11. The method of claim 8 whereinthe result comprises a non-linear metric defined by a kernel model. 12.The method of claim 8 wherein the supervised kernel learning employs arelative comparison constraint.
 13. The method of claim 7 wherein thesearch result comprises a set of entities from a database that aresimilar to the search query input.
 14. The method of claim 7 wherein thedevice is selected from the group consisting of a personal computer, alaptop computer, a computer workstation, a server, a mainframe computer,a handheld computer, a personal digital assistant, a cellular/mobiletelephone, a smart phone, a smart appliance, a gaming console, a digitalcamera, a digital camcorder, a camera phone, an portable music player, atablet computer, a video player, a DVD writer/player, a high definitionvideo writer/player, a television and a home entertainment system. 15.An apparatus comprising: a. a non-transitory memory for storing anapplication, the application for: i. performing combined supervisedkernel learning and unsupervised manifold kernel learning; and ii.generating a non-linear kernel model; and b. a processing componentcoupled to the memory, the processing component configured forprocessing the application.
 16. The apparatus of claim 15 whereinBregman projection is utilized when performing the supervised kernellearning.
 17. The apparatus of claim 15 wherein unlabeled data isutilized in the unsupervised manifold kernel learning.
 18. The apparatusof claim 15 wherein the result comprises a non-linear metric defined bya kernel model.
 19. The apparatus of claim 15 wherein the supervisedkernel learning employs a relative comparison constraint.
 20. Anapparatus comprising: a. a non-transitory memory for storing anapplication, the application for: i. receiving a search query input; ii.performing a search based on the search query input and using a metrickernel learned by manifold-aware ranking kernel learning; and iii.presenting a search result of the search; and b. a processing componentcoupled to the memory, the processing component configured forprocessing the application.
 21. The apparatus of claim 20 whereinmanifold-aware ranking kernel learning comprises: i. performing combinedsupervised kernel learning and unsupervised manifold kernel learning;and ii. generating a non-linear kernel model.
 22. The apparatus of claim21 wherein Bregman projection is utilized when performing the supervisedkernel learning.
 23. The apparatus of claim 21 wherein unlabeled data isutilized in the unsupervised manifold kernel learning.
 24. The apparatusof claim 21 wherein the result comprises a non-linear metric defined bya kernel model.
 25. The apparatus of claim 21 wherein the supervisedkernel learning employs a relative comparison constraint.
 26. Theapparatus of claim 20 wherein the search result comprises a set ofentities from a database that are similar to the search query input.